Simple, Correct Parallelization for Blocked Gibbs Sampling
نویسنده
چکیده
We present a method for distributing collapsed Gibbs sampling over multiple processors that is simple, statistically correct, and memory efficient. The method uses blocked sampling, dividing the training data into relatively large sized blocks, and distributing the sampling of each block over multiple processors. At the end of each parallel run, MetropolisHastings rejection sampling is performed to ensure that samples are being drawn from the correct distribution. Empirical results on part-of-speech tagging and word segmentation tasks show that the proposed blocked sampling method can sample from the true distribution while achieving convergence speed comparable with previous parallel sampling methods.
منابع مشابه
Gibbs Sampling Methods for Stick-Breaking Priors
A rich and exible class of random probability measures, which we call stick-breaking priors, can be constructed using a sequence of independent beta random variables. Examples of random measures that have this characterization include the Dirichlet process, its two-parameter extension, the two-parameter Poisson–Dirichlet process, nite dimensional Dirichlet priors, and beta two-parameter pro...
متن کاملGD-GIBBS: a GPU-based sampling algorithm for solving distributed constraint optimization problems
Researchers have recently introduced a promising new class of Distributed Constraint Optimization Problem (DCOP) algorithms that is based on sampling. This paradigm is very amenable to parallelization since sampling algorithms require a lot of samples to ensure convergence, and the sampling process can be designed to be executed in parallel. This paper presents GPU-based D-Gibbs (GD-Gibbs), whi...
متن کاملEfficient Collapsed Gibbs Sampling for Latent Dirichlet Allocation
Collapsed Gibbs sampling is a frequently applied method to approximate intractable integrals in probabilistic generative models such as latent Dirichlet allocation. This sampling method has however the crucial drawback of high computational complexity, which makes it limited applicable on large data sets. We propose a novel dynamic sampling strategy to significantly improve the efficiency of co...
متن کاملA simple non-parametric Topic Mixture for Authors and Documents
This article reviews the Author-Topic Model and presents a new non-parametric extension based on the Hierarchical Dirichlet Process. The extension is especially suitable when no prior information about the number of components necessary is available. A blocked Gibbs sampler is described and focus put on staying as close as possible to the original model with only the minimum of theoretical and ...
متن کاملSIMD parallel MCMC sampling with applications for big-data Bayesian analytics
We present a single-chain parallelization strategy for Gibbs sampling of probabilistic Directed Acyclic Graphs, where contributions from child nodes to the conditional posterior distribution of a given node are calculated concurrently. For statistical models with many independent observations, such parallelism takes a Single-Instruction-Multiple-Data form, and can therefore be efficiently imple...
متن کامل